Terms & Concepts
Overview
We construct alternative accounting dictionaries to capture the universe of accounting terminology and measure standardization in financial reporting. Each dictionary consists of two components:
- Term Lists: Unique accounting terms that appear in financial reports
- Concept Lists: Granular mappings showing which terms are used to describe the same underlying accounting concepts (i.e., synonyms)
Top-Down (Authoritative Sources): We collect terms from IFRS, US GAAP, and UK GAAP standards, plus specialized accounting dictionaries and the EU’s IATE database. Terms explicitly classified as synonyms are grouped by their underlying concepts. All lists are refined using GPT-based validation and manual checks, then restricted to terminology actually observed in our global corpus of financial reports.
Bottom-Up (XBRL Filings): We extract terms directly from financial statements by parsing XBRL filings on EDGAR. Specifically, we use Exhibit 101.LAB files, which map XBRL taxonomy tags to the natural language labels firms actually use in their reports. This captures real-world variation in reporting practice. To reduce noise, we require that 10-K terms appear in at least 20 distinct filings and 20-F terms in at least 5 distinct filings. We apply a majority disambiguation rule, removing terms that appear in less than 5% of filings for a given concept.
Term Lists
Download all term lists: 📥 Excel File (2.9 MB)
Source: IFRS, US GAAP, UK GAAP standards, and specialized accounting dictionaries
Source: ~50,000 U.S. 10-K XBRL filings (2009-2025)
Source: 20-F XBRL filings from non-U.S. firms using IFRS Taxonomy (2009-2025)
Concept Lists
Download all concept lists: 📥 Excel File (2.7 MB)
Construction: Terms from dictionaries and standards explicitly classified as synonyms are grouped into concepts. Concepts are validated using graph theory (complete graph property) and GPT-based checks to ensure all terms within a concept are truly interchangeable.
Structure: Each row shows a term (TID) and its associated concept (CID), along with the n-gram count.
Construction: Terms are grouped by XBRL taxonomy tags, where each tag represents a distinct accounting concept. Terms linked to multiple tags are assigned to their primary concept using a majority rule (5% threshold).
Structure: Each row shows which term (TID) maps to which XBRL concept (CID). This reveals how U.S. domestic filers describe the same accounting items using different terminology.
Construction: Same methodology as 10-K, but using IFRS Taxonomy tags from 20-F filings. Captures how international filers describe accounting concepts.
Structure: Each row shows term-concept mappings based on IFRS Taxonomy, revealing cross-border variation in financial reporting language.
————————-
Most Common Terms
Download most common terms: 📥 Excel File (5.1 MB)
Source: IFRS, US GAAP, UK GAAP standards, and specialized accounting dictionaries
Source: ~50,000 U.S. 10-K XBRL filings (2009-2025)
Source: 20-F XBRL filings from non-U.S. firms using IFRS Taxonomy (2009-2025)
Most Common Concepts
Download all most common concepts: 📥 Excel File (4.8 MB)
Construction: Terms from dictionaries and standards explicitly classified as synonyms are grouped into concepts. Concepts are validated using graph theory (complete graph property) and GPT-based checks to ensure all terms within a concept are truly interchangeable.
Structure: Each row shows a term (TID) and its associated concept (CID), along with the n-gram count.
Construction: Terms are grouped by XBRL taxonomy tags, where each tag represents a distinct accounting concept. Terms linked to multiple tags are assigned to their primary concept using a majority rule (5% threshold).
Structure: Each row shows which term (TID) maps to which XBRL concept (CID). This reveals how U.S. domestic filers describe the same accounting items using different terminology.
Construction: Same methodology as 10-K, but using IFRS Taxonomy tags from 20-F filings. Captures how international filers describe accounting concepts.
Structure: Each row shows term-concept mappings based on IFRS Taxonomy, revealing cross-border variation in financial reporting language.